09. Data Normalization
Lesson 4 -- Data Normalization
Pre-processing done on FIFA dataset
Here is a link to the Google spreadsheet with the solution dataset
Create Player Ranking by Value
To create a ranking for the players by value, there were several pre-processing steps that needed to be completed.
- The numerical value needed to be extracted or parsed out from the
Valuecolumn. This required some Excel parsing using INDEX and ROW functions in columns titledcharacteranddigit. - Next the value needed to be normalized to ensure that
values in thousands (K) and millions (M) were standardized. See how the Excel IF function was used to in combination with thecharactercolumn to get the normalized value. - From there on, the player was ranked based on the new normalized value data in the
normalizedcolumn.
Create Normalized Height
- Similarly, the get the normalized height, various Excel functions were used to obtain a normalized height column titled
normalized2.
Create Normalized Length of Time since Joining
- To create the length of time since the player had joined, the
Joinedcolumn was used to create ajoined normalizedcolumn by using Excel's TODAY function.
Additional Note
To create outlier thresholds, you can use the following formulae: IQR*1.5 + Q3 value